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Abstract 


With the advance of fluorescence imaging technologies, recently cell biologists are able to 
record the movement of protein vesicles within a living cell. Automatic tracking of the 
movements of these vesicles become key for qualitative analysis of dynamics of theses vesi¬ 
cles. In this thesis, we formulate such tracking problem as video object tracking problem, 
and design a dynamic programming method for tracking single object. Our experiments 
on simulation data show that the method can identify a track with high accuracy which 
is robust to the choose of tracking parameters and presence of high level noise. We then 
extend this method to the tracking multiple objects using the track elimination strategy. 
In multiple object tracking, the above approach often fails to correctly identify a track 
when two tracks cross. We solve this problem by incorporating the Kalman filter into 
the dynamic programming framework. Our experiments on simulated data show that the 
tracking accuracy is significantly improved. 
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Chapter 1 


Introduction 


Every cell must communicate with the world around it. Eucaryotic cells (i.e. cells of 
animals, plants, and fungi) have internal membrane system that allows them to regulate 
the delivery of newly synthesized proteins to the cell exterior. The biosynthetic-secretory 
pathway allows the cell to modify the molecules it produces in a series of steps, store them 
until needed, and then deliver them to the exterior. Such delivery is through protein 
vesicles, which are small bubbles of liquid within a cell. Figure 1.1 conceptualize typical 
biosynthetic-secretory pathways in a cell. In the figure, each compartment encloses a 
space, called a lumen, that is topologically equivalent to the outside of the cell, and all 
compartments shown communicate with one another and the outside of the cell by means 
of transport vesicles. In the biosynthetic-secretory pathway (red arrows) protein molecules 
are transported from the endoplasmic reticulum (ER) to the plasma membrane or (via 
late endosomes) to lysosomes. Some molecules are retrieved from the late endosome and 
returned to the Golgi apparatus, and some are retrieved from the Golgi apparatus and 
returned to the ER. The figure and caption are adapted from [ABL'’'02]. 
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Figure 1.1: The intracellular compartments of the eucaryotic cell involved in the biosyn¬ 
thetic secretory pathways. 
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Fluorescence microscopy is a main tool to study the biosynthetic-secretory processes. 


A cell is normally optical transparent. To visualize the molecules of a protein of interest, 
they can be labeled using fluorescence dye. When excited using light of a particular 
wavelength, the dye can emit light of another wavelength that can be detected. Thus, 
the location of the protein molecules in a cell can be identified. In recent years, the 
resolution of location identification is highly increased through the advance of confocal 
microscopy techniques. In addition, the discovery of fluorescent proteins enables biologists 
to observe the dynamics of proteins in individual living cells. So far, little has been done 
on automatic analysis of the dynamics of protein molecules. Here we focus on tracking 
movement of protein vesicles from microscopy image sequence (i.e. a video). 

Suppose microscopy images are grey scale images. After removing static structure in a 
image sequence, protein vesicle becomes a spot like object that is relative brighter (i.e. has 
a higher value) than dark background. Figure 1.2 shows such an image. In the figure, 
majority of the static cell structure are suppressed, video obtained from [KGH+07]. 
There are two main challenges in such kind of tracking : 1) high level of noise in images 
and 2) tracking of multiple objects. 



Figure 1.2: Three consecutive fluorescence microscopy images showing protein vesicles. 
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Chapter 2 


Literature review 


2.1 Object tracking methods 

Object tracking method is an important first step of automatic analyzing cellular dy¬ 
namics. Object tracking is originally developed in the field of computer vision. Basically, 
tracking can be defined as the problem of estimating the trajectory of an object in the 
image plane as the object moves around a scene. In practice, there are many difficulties to 
successful building of a tracker algorithm. The difficulties related to protein vesicle track¬ 
ing are 1) noise in images, 2) complex object motion, 3) partial and full object occlusions. 
The above problems sometimes can be simplified by incorporating prior knowledge of the 
objects. However, so far little is known about the characteristics of molecular dynamics 
in cells. 

Numerous methods for object tracking have been proposed in the field of computer 
vision. They mainly differ in the following aspects [YJS06]; 1) object representation 2) 
image features used 3) modeling of motion, appearance, and shape of the object. These 
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methods address the above aspects according to the context/environment in which the 
tracking is performed and the tracking information needed for subsequent analysis. 

2.1.1 Object representation 

The shape of an object can be represented in different ways: 1) points, i.e., the centroid 
[VRBOl] or a finite set of points [SMVG04]. 2) Primitive geometric shapes (eg ellipse, 
rectangle etc) [CRM03]. 3) object silhouette and contour [YLS04]. 4) Articulated shape 
models, i.e., shapes held together with joints. 5) Skeleton [BB82]. Since in our case, 
the protein vesicles only occupy small regions in an image, we normally represent these 
vesicles as points. 

In combining shape representations, the are a number of ways to represent the appear¬ 
ance features of objects. Following are a number of appearance features: 1) probability 
densities, can be either parametric [ZY96] or nonparametric [EDHD02]; 2) active appear¬ 
ance models [ETC98], which associate landmarks with feature vectors of color, texture, 
etc; 3) multiview appearance models, which are generated from subspaces of given views 
using techniques like Principal Component Analysis (PCA) and Independent Component 
Analysis (ICA) [MP97]. 

2.1.2 Object detection 

A tracking method needs to be able to determine the existence of an object in every frame 
or in the frame that the object first appears in the video. Some detection methods use 
the temporal information computed form a sequence of frames to reduce false detections. 
Pollowing are several types of common methods used for object detection. 
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Point Detectors: point detectors are used to identify points that belong to an object, 


based on local context. Commonly used point detectors are; Moravec’s interest operator, 
Harris interest point detector, KLT detector, and SIFT detector [MS02]. 

Background Subtraction: object detection can also be achieved by constructing a 
background model and then finding deviations from the model. A significant change in an 
image region from the background model would indicate a moving object. This process 
is called background substruction. Due to the increasing abilities to efficiently model 
complex background, most of recent tracking methods for fixed cameras use background 
subtraction methods to detect regions of interest (for example [HHDOO], [CLFKOl]). 

Segmentation; image segmentation algorithms is used to partition the image into per¬ 
ceptually similar regions. A segmentation algorithm normally defines criteria for a good 
partition and provides a method for efficient partitioning. Several types of segmentation 
methods have been designed for tracking purpose: 1) mean shift clustering [CM99], which 
performs clustering in the joint spatial and color space; 2) graph cutting, which converts 
the image into a graph, and uses min-cut algorithm to find disjoint regions [SMOO]; 3) 
active contours, which evolves a closed contour to the objects’ boundary, evaluated by 
certain energy function (for example [KWT88]). 

Supervised Learning: in this approach, object detection is formulated as a classifica¬ 
tion problem. The learning algorithm is used to generate models from the features of the 
objects in the training video where the objects are known. Then these models are used 
to predict the existence of objects in new video data. When a set of image features are 
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properly chosen, a number of different classification methods can be used, such as neu¬ 
ral networks [RBK98], adaptive boosting [VJS05], decision trees [GK95], and support 
vector machines [POP98]. 

2.1.3 Object tracking 

Object tracking is used to generate the trajectory of an object over time by locating its 
position in every frame of the video. There are two tasks in object tracking: detecting 
the object and establishing association of objects between frames. These two tasks can 
be performed separately or jointly. For different object representation, different types of 
tracking methods are developed. They fall into three categories: 1) point tracking, which 
represents objects as points and only estimates the object’s position in each frame; 2) 
kernel tracking, which uses object shape and appearance and considers not only trans¬ 
lation but also rotation of objects; 3) silhouette tracking, which uses template matching 
to identify objects in each frame. Because in our project, the protein vesicles are mainly 
represented as points, in this section, we will only describe point tracking methods. 

In point tracking, tracking can be formulated as the association of detected objects 
represented by points across frames. In general, there are two types of methods to asso¬ 
ciate points: deterministic and statistical methods. 

Deterministic methods usually define a cost of associating each object in frame t — 1 to 
a single object in frame t using a set of motion constraints. Minimization of the association 
cost is formulated as a combinatorial optimization problem. Optimal assignment methods 
are developed to obtain the best one-to-one association among all possible associations 
(for example, Hungarian algorithm [Kuh55]). The association cost is usually defined by 
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using a combination of the following constraints: 1) object displacement between frames; 
2) maximum velocity; 3) small velocity change; 4) common motion, which requires the 
velocity of objects in a small neighborhood to be similar; 5) rigidity, which assumes that 
objects are rigid. 

Here are a number of such methods. Sethi and Jain [SJ87] proposed a greedy method 
to solve the association problem. It is based on the proximity and rigidity constraints 
applied on two consecutive frames. Safari and Sethi [SS90] designed a method that 
establishes association for the detected points and then extend the tracking of the missing 
objects by adding hypothetical points. From their work, Veenman et al. [VRBOl] further 
added the common motion constraint. Shafique and Shah [SS05] uses the temporal 
coherency of speed and position in multiple frames as constrain. In their approach, the 
association problem is converted to finding the best unique path for each point on a 
graph. 

Compared to deterministic methods, statistical association methods are used to re¬ 
duce the effect of noise in the video and the perturbation of movements in objects by 
incorporating the randomness into model. They use the state space approach to model 
the object properties such as position, velocity, and acceleration. 

For single object tracking, a typical statistical association method is Kalman filter 
[BC86], which assumes the transition of system states is linear and the noise is Gaussian. 
We will use Kalman filter to enhance our tracking method. One limitation of Kalman filter 
is the assumption of Gaussian distribution of the state variables (see Method section for 
details). Particle filtering [Mac98] has been used to reduce the above limitation through 


model estimation by importance sampling. 



Multiobject association and state estimation are often carried out statistically. When 


tracking multiple objects using Kalman or particle filters, the association problem needs 
to be solved before these filters can be applied. There are two popular methods for data 
association: Joint Probability Data Association Filtering (JPDAF) [CA91] and Multiple 
Hypothesis Tracking (MHT) [Rei79]. JPDAF extends Kalman filter by replacing its 
innovation of single track with a sum of innovations of multiple tracks weighted by the 
posterior probability that a measurement is associated with that track. On the other 
hand, MHT iteratively improves associations. In each iteration, the algorithm starts 
from a set of current track hypotheses, in form of collections of disjoint tracks. For each 
hypothesis, the algorithm predicts each object’s position in the next frame. By comparing 
the predictions with actual measurements, associations are established for each hypothesis 
and a set of new hypotheses are formed for next iteration. 


2.2 Tracking methods applied to cellular dynamics 

The field of tracking of molecular dynamics in cells is relatively new. So far only a small 
number of methods are developed for this purpose. They are summarized as follows. 

Sbalzarini et al. [SK05] proposed a tracking method that consists of two steps: 
feature detection and trajectory linking. In this approach, proteins are represented as 
points. The feature detection step mainly consists of detection of refinement of points 
according to local maxima values. Given the detected candidate locations, trajectory 
linking then associates the points between each of two adjacent frames. 
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Godinez et al. [GLW“*'07] designed a method for tracking virus particles. The method 
consists of virus particle detection and association. For the detection step, they use 
Laplacian-of-Gaussian filtering for detecting spots. Laplacian-of-Gaussian is a image 
filtering technique that applies Gaussian blur and Laplacian operator to a image. They 
then use Gaussian fitting to enhance spots. For the association step, they employed a 
smooth motion and nearest neighbor constraints to link detected particles between frames. 

Sage et al. [SNH'*'05] proposed a dynamic programming approach for tracking the 
fluorescent markers attached in a single chromosome in a cell. Basically, given a discrete 
scalar field, dynamic programming is a computational technique that can be used to 
find a curve such that integration along this curve would achieve optimal value. The 
advantage of such approach is that object detection step is not required. We will describe 
this approach in detail in the next section. 

The above approaches are deterministic. A few stochastic tracking approaches also 
appeared recently. For example, Yoon et al. [YBFK08] proposed to use particle filter 
to track a single molecule. Simply speaking, when the states of objects are modeled 
as a Markov Chain, particle filter obtains optimal Bayesian estimation of states given 
noisy observations over time. Also using particle filter, Smal et al. [SDG“’'07] designed 
a method for tracking microtubles in a cell. In their approach, microtubes are modeled 
using Gaussian functions for the detection. After detecting the molecules, particle filter 
is used for estimation of tracks. 
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Chapter 3 


Tracking of single vesicle 


3.1 Problem formulation 


In this section, we study the tracking of a single vesicle in a video. We solve single vesicle 
tracking problem through an optimization approach given a sequence of n-dimensional 
images (normally n = 2). Denote X C R” as the set of all possible locations, which is 
identical for all images. Let /(x, t) be the intensity of location x G X of the image at 
time t, we want to find a track x*, t = 1,... ,T such that the following score function is 
maximized: 


T T 

ST = Y1 “ Xi-lID) (3-1) 

t=l t=2 


Intuitively, this score function tends to be high when the intensity along the track is 
high and displacement is low. These two factors are balanced using a with a weight w. 
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3.2 A dynamical programming approach 


Such optimization problem can be solved using dynamic programming technique [CLRSOl] 
The above problem can be decomposed into subproblems, and optimal solutions of sub¬ 
problems can be used to find the optimal solutions of the overall problem. Formally, let 
st(x) be the maximum score of all tracks of length t that ends up at position x. That is: 


t-i 

st(x) = maxxi,...,xt_i {/(xi, 1 ) + ^[/(xr,r) - u;||xr - Xr_i||] -f [/(x,t) - rt;||x - x^.iH]} 

r=2 

(3.2) 

Then st(x) can be calculated using st-i(-) as follows; 

St(x) = mc^ [st-i(y) /(x, t) - rcHx - y ||] (3.3) 

yex 

Thus given x, the calculation of st(x) will automatically identify xt_i that would 
achieve maximum st(x). In addition, argmax^ S'r(x) gives the location where the best 
track terminates at time T. This provides foundation of tracking. The procedure of 
optimal scores and trace-back can be summarized in the following algorithms. Similar 
approach has been used by Sage et al. [SNH'''05] to track a single particle in noisy 
images. 


3.3 Experimental results 

Figure 3.1 shows the experiments of applying the dynamic programming approach. The 
left to right sub-figures correspond to experiment 1, 2 and 3 respectively. Two red curves 
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Algorithm 1 DPScoring 

Input: (1) X, the set of all locations 

(2) T, total number of time points 

(3) /(x, t), the intensity function of image sequences of T time points 

Output: (1) St(x), the function of maximum score from all tracks that terminates 

at X at time t 

(2) bt{x), the trace back function that indicates the location of track 
(corresponding to st{x)) at time t — 1 

1: for all X in X do 

2: 'Si(x) /(x, 1); { In discrete space case, st(x) is an n dimensional array. Its 

elements are to be determined in this algorithm. Same is for bt{x).} 

3: eud for 

4: for t = 2 to T do 

5: for all X in X do 

6: st(x)^maxygx [st_i(y) +/(x, t) - u;||x - y||]; 

7: 6t(x) ^ argmaxygx [st-i(y) +/(x, t) - u:||x - y||]; 

8 : eud for 

9: eud for 

Algorithm 2 DPTraceback 

luput: (1) St{x), the function of maximum score from all tracks that terminates 

at X at time t 

(2) bt{x), the trace back function that indicates the location of track 
(corresponding to Si(x)) at time t — 1 

Output: xi,..., XT, the locations in the track that achieves maximum score st 

1: XT ^ argmax^gx ^t(x.) 

2: for t = T — 1 to 1 do 
3: Xt ^ bt+l{xt+l) 

4: eud for 


indicate the true positions of the objects. Green curve indicates the inferred positions of 
a track using the dynamic programming approach. The objects move up and down, and 
the video proceeds from left to right. 

In the first experiment, we assume re = 1. That is, the object is moving in an one 
dimensional discrete space. Then each image can be represented as a column vector 
(of length 200). They are put together from left to right according to order in time. 
The images contain two objects moving over time of length 200. The objects start from 
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fixed positions, and their movements are modeled as discretized Brownian motion, i.e. a 
normal distribution A^(0, 2). The true intensity of the object is 0.5, and the noise follows 
a normal distribution of N(0.2, 0.2). 

Because the dynamic programming approach does not assume any movement model, 
it can be used to track more complicated movements. In the second experiment, addition 
to the first experiment, we further add a constant shift —0.2 to the motion, i.e. the motion 
follows A^(—0.1,2). In the third experiment, we assume there is a constant acceleration 
—0.003, i.e. the motion follows A^(—O.OOSt, 2). In all the three experiments, the tracking 
algorithm correctly inferred one of the tracks. 


To obtain another track, it is possible to trace back from the position of the second 
best score. However, in practice, in general we don’t know the number of objects. Also, 
objects may emerge or disappear during image recording. 



Figure 3.1: Examples of tracking two moving objects in a noisy video. 


3.3.1 Tracking accuracy 

Performance measure: we use the Root Mean Squared Error (RMSE) to measure the 
tracking accuracy [SNH+OS]. This measure is defined as RMSE(x) = ||xi — Xi|p, 
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which is the difference between the estimated track x and true track x. Here the expec¬ 


tation is simply approximated by averaging different realization across time. 

Average performance on different parameters and configurations: Starting 
from the setting of the experiment 1 in Figure 3.1, we simulated the data tracking by 1) 
using different norm (specified by different powers) in the dynamical programming track 
estimation algorithm, 2) using different weight w, 3) adding different amount of noise in 
the images, 4) different variance in the Brownian motion. 

For each configuration, the simulation and tracking are repeated 100 times and the 
average RMSE is calculated. Since there are two objects in the model but only one track 
is estimated, for each simulation, the smaller of the two RMSEs are chosen for averaging. 
The results are summarized in Eigure 3.2. Note that the scales of different plots are 
different. 

It can be seen from the figure, generally, 1) the tracking is insensitive to the choice 
of norms, 2) the smaller the magnitude of weight, the more accurate the tracking; 3) the 
tracking performance is similar when the level of noise (indicated by mean and standard 
deviation of the normal distribution of noise) is less than 0.3, but decreases quickly when 
noise level increases from 0.3; 4) the faster the movement of objects (indicated by the 
standard deviation of Brownian motion), the higher the tracking errors. 

Selection of the optimal track: In the experiment 1 of Figure 3.1, only one of 
the two tracks is selected. To measure the contribution of noise to the choice of one track 
against another, we fixed the object tracks as in the experiment, and repeated tracking 
of the videos with different instances of noise for 1000 times. We found the lower track 
is selected with a frequency of 39.8%, the upper one is with frequency 60.2%. 
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Figure 3.2: Performance analysis of different settings. 
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The reason on why there is a slightly higher chance for the upper track be selected is 


probably due to the shapes of the two tracks: the upper track has a smaller overall drift 
than the lower track. In such case, the noise nearby the track, if realized in a high value, 
can be included into the estimated track to reduce the cost of displacement, therefore 
generating a higher dynamic programming score. 

To demonstrate this effect, we repeated the above experiment on a new set of videos (of 
200 pixels in size and 50 time points) containing two objects. The moving displacements 
of both objects are 1 between two consecutive frames. But one object keep change 
moving direction, therefore following a zigzag track. While the other object does not 
change direction, therefore following a linear track. Figure 3.3 shows a noise free video 
that consists these two tracks. Without the presence of noise, the two tracks would 
result in the same dynamic programming scores. However, the noise provided positive 
contributions to those tracks with small drift. Our experiment shows the frequency of 
the two tracks being selected are 68.8% and 31.2% respectively. 
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Figure 3.3: Simulated noise-free video that consists of one zigzag track and one linear 
track. 
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Chapter 4 


Tracking of multiple vesicles 


4.1 Track elimination and enumeration 


The dynamic programming approach can only generate track of one object. To track 
multiple objects, in practice, one can generate the estimation of multiple tracks by repet¬ 
itively eliminating the signal along the track inferred by dynamical programming. The 
following MultiTrack algorithm gives an example of generating multiple tracks. 


Algorithm 3 MultiTrack 

Input: (1) Utrk) the number of tracks to enumerate 

(2) X, the set of all locations 

(3) T, total number of time points 

(4) /(x, t), the intensity function of image sequences of T time points 

Output: . .. ,x^^),..., ... ,x^*'''^^), the best tracks 

1: for i = 1 to ntrk do 

2: (s, b) ^ DPScoring(X, T, /) 

3: ,..., x^^) DPTraceback(s, b) 

4: f <r- TrackElimination(X, T, /, (x^^\ ..., x^^)) 

5: end for 


The MultiTrack calls TrackElimination algorithm to eliminate a track in the video. 
It is described as follows. 
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Algorithm 4 TrackElimination 
Input: (1) X, the set of all locations 

(2) T, total number of time points 

(3) /(x, t), the intensity function of image sequences of T time points 

(4) (xi,xr), one track 

Output: /(x, t), the intensity function of image sequences with eliminated track 

1: for t = 1 to T do 
2: x' a random location in X 

3 : /(xt,t) ^/(x',t) 

4 : end for 


For simplification, the above algorithm only stops when a fixed number of ritrk is 
obtained. A more rigorous stopping criterion may be obtained by comparing the scor¬ 
ing function st obtained from true video against st from a permutated video through 
randomly shuffling its pixels. 

In addition, the above TrackElimination algorithm assumes the object is only of one 
pixel. In practice, one object on the track may occupy a small consecutive region instead 
of just one pixel, we can assume that the track is the trajectory of the center of the 
object. To remove this object, we can approximate the object using a Gaussian function 
and subtract the values of this function from the image. To be more specific, denote 
the gaussian function as /t(x) = atexp(—(x — xj)^Sjr^(x — x^)), where x^ are obtained 
from trace back information, but at and are to be estimated so that ft would have 
best least square fit to the image. McKenna et al. used similar idea for the modeling 
of object intensities for tracking [MRG99], where they model the pixel intensities of an 
object as random variables that follow bivariate Gaussian distribution. They then used 
expectation maximization to estimate the mean and covariance matrix of the Gaussian 
distribution, obtaining the best approximation of the object. 
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4.1.1 Experimental results 


An example of applying TrackElimination to the simulation generated in Section 3.3 is 
shown in Figure 4.1. 



Figure 4.1: Left; Simulated video used in Section 3.3 (same as Figure 3.1 without 
estimated track). Right: the video with the best track eliminated. 

The accuracy of second track after eliminating first track: We use the fol¬ 
lowing steps to test the tracking performance of the second object by following steps: 1) 
simulate videos using the same model as in Section 3.3; 2) use algorithms DPScoring and 
DPTraceback to obtain the track of one object; 3) use TrackFlimination to eliminate the 
track from the video; 4) use DPScoring and DPTraceback to get the second estimated 
track. We repeated the above steps 1000 times. We find the RMSE of the first track¬ 
ing from step 3 is 2.29 ± 0.89, and the average RMSE of the second tracking from step 
4 is 2.30 ± 0.55. These two RMSEs are very comparable. We conclude that the track 
elimination strategy can successfully estimate tracks of multiple objects. 
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4.2 Improving track associations using Kalman filter 


In this section, we propose to improve the tracking accuracy by incorporating Kalman 
filter into dynamic programming framework. 

4.2.1 Motivation 

In practice, when a video contains multiple moving objects, the trajectories of these 
objects can often get very close or even cross each other. In this case, the previous 
dynamical programming approach may infer a track that is actually a mixture of multiple 
real tracks. Figure 4.2 shows such an example, resulting from a video of 100 pixels and 
the 100 time points. In this example, the real track (indicated by red lines) of two objects 
are two cross straight lines plus small random displacements following discritized normal 
distribution. The green curve corresponds to the estimated track. It can be seen in this 
figure that the estimated track is actually a combination of first part of one track and 
the second part of another track. To avoid such problem, we modify the score function 
in Equation 3.1 to incorporate Kalman filter which provides estimate of current object 
state. 

4.2.2 Introduction to Kalman filter 

Kalman filters are based on linear dynamical systems in discrete time domain. Let the 
state of the system at time t be represented as a real vector s*. The Kalman filter model 
assumes the true state at time t is evolved from the state at {t — 1) according to 


St = Ftst-i + Btut + wt 


(4.1) 
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Figure 4.2: Wrong tracking of two objects due to cross of their real tracks. 

where 

• Ft is the state transition operator applied to the previous state st_i; 

• Bt is the control-input operator applied to the external control vector ut; 

• wt is the process noise. It is assumed to be sampled from a zero mean multivariate 
normal distribution with covariance Q^. 

At time t an indirect measurement zt of the true state St is observed according to 

zt = Utst + vt (4.2) 

where is the operator that transforms the true state space into the observed space 
and Vi is the observation noise. The noise is assumed to be zero mean Gaussian white 
noise with covariance R*. 

vt~A^(0,Rt) (4.3) 
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In addition, the initial state, and the noise vectors at each step {sq, wi,..., wt, vi...vt} 
are all assumed to be mutually independent. 

The Kalman filter method consists of two phases: predict and update. In the predict 
phase, the current state estimate is generated using previous state estimates: 

• Predicted state 

• Predicted estimate covariance + Qj_i 

In the update phase, the currently observed measurement information is used to refine 
the prediction: 

• Innovation or measurement residual = zt — 

• Innovation (or residual) covariance St = + Rt 

• Optimal Kalman gain 

• Updated state estimate 

• Updated estimate covariance P^i^ = (/ — KiHj)Pj|^_i 

4.2.3 Improving tracking by incorporating Kalman filter 

General idea: We incorporate Kalman filter into our dynamic programming framework 
as follows: Let zt be the vector of observed state of an object, which is usually a combi¬ 
nation of the object’s location, velocity etc. Assume zt is observable at all time. Kalman 
filter can provide an estimation z^-i of the object state at time t, given observations up 
to time t — 1. Suppose zt is of length m, Equation 4.4 gives a modified scoring function. 
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(4.4) 


T m T 

ST = ^f{zt,t) - “ 4i-lll)] 

t=l i=l t=2 

where is the ith element of zt, and f{zt,t) only uses the location elements of zt- 

Design of the dynamic model: We propose a simple design of the dynamics model for 
the construction of the Kalman filter for video of one dimensional images. In this case, 
the true location Xf is a scalar xt. Assume there is no control on the objects, so we have 
Bj = 0 and = 0. Also, assume F, H, R, and Q are time invariant. We define the state 
vector as location and velocity of a vesicle. 


St = 


Xt 


Xt 


(4.5) 


We assume that between the t — 1 and t timestep the vesicle undergoes a constant 
acceleration of at that is normally distributed, with mean 0 and standard deviation (Jq. 
Assuming object motion follows Newton’s laws, we have 


St — Fst_i + Gat 


(4.6) 


where 


F = 


1 At 
0 1 


(4.7) 


and 
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with At = 1. We find that 



At 


(4.8) 


Q = cov(Ga) = E[(Ga)(Ga)'^] = GE[a^]G'^ = G[(tI]G'^ = a^GG^ (4.9) 

At each time step, a noisy measurement of the true position of the vesicle is made. Assume 
the noise is also normally distributed, with mean 0 and standard deviation cr^. 


zt = H St + vt 


where 


H = 


1 


0 


0 0 


and 


(4.10) 


(4.11) 


R. = E[vtvf] 



(4.12) 


In dynamic programming, we assume to know the initial starting state of the vesicle with 
perfect precision, so we initialize 


So|o — 


XQ 



(4.13) 
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and to tell the filter that we do not know the exact position and speed, we give it a zero 


covariance matrix; 


Po|o 


B 0 
0 B 


(4.14) 


with some large number B. The filter will then prefer the information from the first 
measurements over the information already in the model. Given this above dynamic 
model, we modify the algorithm DPScoring by replacing the displacement with the object 
position predicted from Kalman filter, as in Equation 4.4, where 




h\t-i 

0 




(4.15) 


4.2.4 Combining dynamic programming and point detector 

We compare the above integrating Kalman filter with dynamic programming approach 
with the approach that directly uses Kalman filter to associate the object states estimated 
by object detectors. Since in our case, the protein vesicles are small, we regard them as 
points. So we use point detector. A simple point detector that is robust to the noise 
in the images is based on Gaussian filter. Let (^(x, S) be a Gaussian function with 
covariance matrix S, the Gaussian filtering is the convolution of an image (at time t) and 
the Gaussian function. g{x,t) = f /(y, t)G(x — y, Sjdy. After applying the Gaussian 
filter, the noise in the image are reduced. Then the pixels of g{x, t) with high intensities 
may correspond to objects. 
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Given the detected possible locations of a object at time 0, we can then use greedy 


approach through Kalman filter to associate the locations to form a track, as shown in 
the following algorithm. 


Algorithm 5 DectorKalman 

Input: (1) X, the set of all locations 

(2) T, total number of time points 

(3) g{z,t), the Gaussian filtered intensity function of image sequences of 
T time points 

(4) F,G,Q,H,R , the parameters for the dynamic model 

Output: yi,..., yt, the locations in the estimated track 

1: yi ^ argmaxj,gx5'(a,t); 

2: for t = 2 to T do 

3: calculate zm_i using Kalman filter 

4: yt ^ argmax^gx b(a,t) - w\\zt\t_i -a||]; 

5: end for 


4.2.5 Experimental results 

In our experiment, we assume the protein vesicles are under small amount of acceleration, 
therefor we choose aa = 0.01. We assume az = 1 and B = 1. We set 5t = 1. Given 
these parameters, we tested the above new scoring method on the example data shown 
in Figure 4.2. Figure 4.3 shows that the method correctly finds one of the two tracks. 

Given the same true tracks as in the above example, we simulated 1000 noisy videos 
and compare the performance between tracking using 1) dynamic programming with 
Kalman filter 2) dynamic programming without using Kalman filter and 3) point detector 
with Kalman filter (with o" = 1 for Gaussian filtering). The first method gives an RMSE 
of 2.9 ± 2.1, the second method gives RMSE of 8.3 ± 5.8 and the third method gives 
17.6 ± 14.7, suggesting that, compared to the pure dynamic programming approach, 
the integration of Kalman filter with the dynamic programming greatly reduced the 
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Figure 4.3: Overcoming the wrong association problem by using Kalman filter. 

association mistake induced by the cross of two tracks. On the other hand, use Kalman 
filter alone substantially rely on the accuracy of the point detector. However, in our case, 
the objects are very small, and the noise is very strong, it is very hard to detect the 
objects from single images even after filtering. So the third approach resulted in very 
inaccurate tracking. 
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Chapter 5 


Conclusion and discussion 


Automatic tracking of protein vesicle’s movements is key to qualitative analysis of the 
dynamics of these vesicles. The main challenge of such tracking is that the video data 
is very noisy and the vesicles are very small. In this thesis, after providing an overview 
of the field of object tracking and their application to the tracking of molecules in cells, 
we studied the tracking of single and multiple vesicles using dynamic programming and 
Kalman hlter based approaches. Our experiments on simulation data show that dynamic 
programming approach can achieve high tracking accuracy for single vesicle tracking even 
there are high levels of noise in the video, and the integration of Kalman filter further 
significantly increased tracking accuracy by in the case of tracking of multiple vesicles. 

Due to the complexity of the vesicle movements, many issues in such tracking remain 
to be explored. For example, all methods used in this thesis assume the existence of 
the vesicles in all video frames. In real videos, the vesicles could emerge or disappear in 
some frames. The vesicles may also split or merge. Therefore, more complex association 
methods like Multiple Hypothesis Testing or Particle Filter may be used to handle this 
situation. 
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